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ABSTRACT 

Every day millions of users are connected through online social 
networks, generating a rich trove of data that allows us to study the 
mechanisms behind human interactions. Triadic closure has been 
treated as the major mechanism for creating social links: if Al- 
ice follows Bob and Bob follows Charlie, Alice will follow Char- 
lie. Here we present an analysis of longitudinal micro-blogging 
data, revealing a more nuanced view of the strategies employed by 
users when expanding their social circles. While the network struc- 
ture affects the spread of information among users, the network 
is in turn shaped by this communication activity. This suggests a 
link creation mechanism whereby Alice is more likely to follow 
Charlie after seeing many messages by Charlie. We characterize 
users with a set of parameters associated with different link creation 
strategies, estimated by a Maximum-Likelihood approach. Triadic 
closure does have a strong effect on link formation, but shortcuts 
based on traffic are another key factor in interpreting network evo- 
lution. However, individual strategies for following other users are 
highly heterogeneous. Link creation behaviors can be summarized 
by classifying users in different categories with distinct structural 
and behavioral characteristics. Users who are popular, active, and 
influential tend to create traffic-based shortcuts, making the infor- 
mation diffusion process more efficient in the network. 
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1. INTRODUCTION 

User activity in online social networks is exploding. Social and 
micro-blogging networks such as Facebook, Twitter, and Google 
Plus every day host the information sharing activity of billions of 
users. Using these systems, people communicate ideas, opinions, 
videos, and photos among their circles of friends and followers 
across the world. These interactions generate an unprecedented 
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Figure 1: The dynamics of and on the network are strongly 
coupled. The bottom layer illustrates the social network struc- 
ture, where the blue arrows represent "follow" relationships 
with the direction of information flow. The dashed red arrow 
marks a newly created link. The upper layer depicts the flow of 
information between people in the same group, leading to the 
creation of the new link. 



amount of data that can be used as a social observatory, providing 
a unique opportunity to shed light on the mechanisms of human 
communication with a quantitative approach (33| p3) [3T| |23[ |52[ 
|53). 

Research on social media revolves around two main themes: com- 
munication and its social network substrate. Most network models 
focus on either the structural growth of the system — the dynamics 
of the network — or information diffusion processes — the dynam- 
ics on the network. The present work establishes a feedback loop 
between these two dynamics. 

Much effort has been devoted to modeling the evolution of so- 
cial networks |55|[7l [39||23| . Among proposed mechanisms of how 
a link is created, triadic closure |51[ |24| is a simple but powerful 
principle to model the evolution of social networks based on shared 
friends: two individuals with mutual friends have a higher than ran- 
dom chance to establish a link. In directed networks, such as Twit- 
ter or Yahoo! Meme, triadic closure implies a particular order with 
respect to the direction of links: once Alice follows Bob and Bob 
follows Charlie, Alice will follow Charlie. Triadic closure has been 
observed in both undirected and directed online social networks 
and incorporated into several network growth models 1 34| |29| |50[ 
|47| . However, most existing models do not take user activity — or 



how information spreads on the network — into consideration. 

Social micro-blogging networks, such as Twitter, Google Plus, 
Sina Weibo, and Yahoo! Meme, are designed for information shar- 
ing. As illustrated in Fig.[T| the social network structure constrains 
communication patterns, but information propagated through the 
network also affect how agents behave and ultimately how the net- 
work changes and grows. In this paper we study the role of infor- 
mation diffusion in shaping the evolution of the network structure, 
and the individual strategies that bring about this effect by way of 
creating social links. 

The major contribution of this paper is to present clear evidence 
that information diffusion affects network evolution at both system- 
wide and individual levels. In particular, we find that a consider- 
able portion of new links are shortcuts based on information flow 
(§|4](. There is significant statistical evidence for triadic closure as 
a link creation mechanism, but also that users tend to link to people 
who have generated content they have seen (§ |4. l| l. Furthermore, 
not all users apply the same strategy to grow their social connec- 
tions; users with high in-degree tend to pay more attention to traffic 
(§ |4.2[ (. However, shortcuts are not equally probable; we find that 
users follow the most active sources of content; purely topological 
mechanisms cannot account for these shortcuts (§ |4.3[ ). As a result, 
traffic-based shortcuts can make the social network more efficient 
in terms of information diffusion (§ |4.4| l. In §[5] we perform a Max- 
imum Likelihood Estimation analysis to quantify the system-wide 
prevalence of different link creation strategies. Finally, the cate- 
gorization of users suggests the existence of several distinct link 
formation behaviors (§|6](. Our findings identify information diffu- 
sion dynamics as a key factor in the evolution of social networks. 

2. BACKGROUND 

Early models concerning communication dynamics were inspired 
by studies of epidemics, assuming that a piece of information could 
pass from one individual to another through social contacts |45 
|21| |I5| [3] [2j. These models have been extended to include cas- 
cade phenomena I22J, factors that influence the speed of spreading 
such as information recency [ 36J , the heterogeneity in connectivity 
patterns |42|, clustering 1 4 0|7u ser-created content |5|, and tem- 
poral connectivity patterns (37] |9] [TO] |44| . An alternative class 
of models is based on the idea of a threshold; you propagate an 
idea when some number of friends communicate it to you ^25)|38 J. 
These models are believed to be relevant in the diffusion of rumors, 
norms, and behaviors 1 12 1, and have been extended to study the role 
of competition for finite attention [57|. The large majority of these 
studies consider either a static or annealed underlying social net- 
work, under the assumption that the network evolves on a longer 
(slower) time scale than the information spread. Recent research 
has addressed the modeling of intermediate cases, in which the two 
time scales are comparable. These approaches consider the two 
dynamics as either independent f46' "43 1 or coupled |54]|49). The 
foundations of this last class of models are very similar to those ex- 
plored in this paper. However, thus far, these models have focused 
mainly on epidemic processes in which links are deleted or rewired 
according to the disease status of each node |54| |49| . The social 
systems considered in this paper are governed by quite different 
underlying mechanisms. 

Models devoted to reproducing the growth and evolution of net- 
work topology have traditionally focused on defining basic mech- 
anisms driving link creation (55[|39[[7) . From the first model pro- 
posed in 1959 by Erdos and Renyi |18|, many others have been in- 
troduced capturing different properties observed in real networks, 
such as the small-world phenomenon |56| |34| |29[ |50[ |47| , large 
clustering coefficient ^56,,34. 29j_50^|47| , temporal dynamics |44[ 



46 , and heterogeneous distributions in connectivity patterns f6l|28[ 
32 |30[|17|[T^) . In particular, this latter property was first described 
by the preferential attachment |6| and copy models |32|. 

In the social context, the rationale behind preferential attachment 
mechanisms is that people prefer linking to well connected indi- 
viduals |6]. Although very popular, this prescription alone is not 
sufficient to reproduce other important features of social networks. 
Other models have been put forth to fill this gap, including ingredi- 
ents such as homophily (27[|35|[4T| [T||20| and triadic closure |5I[ 
[24][34]r29 50 47|. 

Homophily describes the tendency of people to connect with oth- 
ers sharing similar features |,35 27 1. Its impact on link creation in 
large-scale online networks is a recent topic of discussion (4I[ [T] 

H). 

The triadic closure mechanism is based on the intuition that two 
individuals with mutual friends have a higher probability to es- 
tablish a link (5I[ |24| . This tendency has been observed in both 
undirected and directed online social networks and incorporated 
into several network growth models 1 ,341 129[ [SO] pl7J . In particu- 
lar Leskovec et al. have tested triadic closure against many other 
mechanisms in four different large-scale social networks |34l|. By 
using Maximum Likelihood Estimation (MLE) [ 14| they have iden- 
tified triadic closure as the best rule, among those considered, to 
explain link creation. 

Although similar in spirit, our approach is different from this 
large body of literature. We do not consider agent based simula- 
tions in which the structural behavior of each user is modeled by 
a set of rules. We adopt the MLE framework extending the work 
of Leskovec et al. |34|. We extend the notion of triadic closure by 
considering mechanisms based on traffic, or more in general, users 
activity. We explicitly study the coupling between the dynamics 
of and on the network, connecting these two previously separated 
themes of research, in the context of online social networks. 

3. MEME DATESET 

We study Yahoo! Meme, a social micro-blogging system simi- 
lar to Twitter, which was active between 2009 and 2012. We have 
access to the entire history of the system, including full records 
of every message propagation and link creation event, from April 
2009 until March 2010. A user j following a user i is represented 
in the follower network by a directed edge I = (i, j), indicating j 
can receive messages posted by i. We adopt this notation, in which 
the link creator is the target, to emphasize the direction of informa- 
tion flow. Edges are directed to account for asymmetric relations 
between users; a node can follow another without being followed 
back. In our notation, the in-degree of a node i is the number of 
people followed by i, and the out-degree is the number of i's fol- 
lowers. Users can repost received messages, which become visible 
to their followers. When user j reposts content from i, we infer 
a flow of information from i to j. Each link is weighted by the 
numbers of messages from i that are reposted or seen by j. 

At the end of the observation period, the Yahoo! Meme fol- 
lower network consisted of 128,199 users with at least one edge, 
connected by a total of 3,485,361 directed edges. Fig. [2] displays 
general statistics about the growth and structure of the network. 

4. LINK CREATION MECHANISMS 

When users post or repost messages, all their followers can see 
these posts and might decide to repost them, generating paths that 
together form cascade networks. When receiving a reposted mes- 
sage, a user in such a path can see both the grandparent (G, the user 
two steps ahead in the path) and the origin [O, original source). A 
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Figure 2: General statistics of the Yahoo! Meme system. 
(A) The growth of the system in time, the number of users 
(red circles), links (blue squares) and messages (green trian- 
gles). (B) Broad distributions of in-degree and out-degree in 
the follower network of Yahoo! Meme. Users were not allowed 
to follow more than 1,000 people, which is the maximum in- 
degree a node can attain. 
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Figure 3: (A) Illustration of the link creation mechanisms. 
(B) Venn digram of the proportions of grandparent, origin, and 
triadic closure links among all existing edges. 



user may decide to follow a grandparent or origin, receiving their 
future messages directly. These new links create shortcuts connect- 
ing users at any distance in the network. A triadic closure occurs 
when a user follows a triadic node (A, the user two steps away 
in the follower network). The definitions of different types of link 
creation mechanisms are illustrated in Fig.[3|A). 

The Venn diagram in Fig.[3|B) shows the proportions of links of 
different types and the logical relationships between these sets of 
links. We observe that 84.8% of new edges consist of triadic clo- 
sures, 21.5% form shortcuts to grandparent, and 19.5% to origins. 
Note that not all the grandparents are triadic nodes, because users 
are allowed to repost messages from people they are not following 
in Yahoo! Meme. This account for 0.03% of links. There is a large 
overlap between triadic closure links and traffic-based shortcuts. 
This can be explained by the phenomenon that most real-world in- 
formation cascades are shallow |4| and thus triadic closure links 
and traffic-based shortcuts coincide. 

This evidence suggests that traffic-based link creation mecha- 
nisms are an important complement to the triadic closure in model- 
ing network evolution. Actions of posting and reposting induce the 
creation of shortcuts, shaping the structure of the network. Newly 
created links in turn determine what messages are seen by users, 
making the network more efficient at spreading information. 

4.1 Statistical Analyses of Shortcuts 

To quantify the statistical tendency of users to create shortcuts, 
let us consider every single link creation in the data as an indepen- 
dent event. We test the null hypotheses that links to grandparents, 
origins, and triadic nodes are generated by choosing targets at ran- 
dom among the users not already followed by the creator. 

We label each link I by its creation order, 1 < i < L, where 
L is the total number of links. For each link, we can compute the 
likelihood of following a grandparent by chance: 



m 



where Ng{£) is the number of distinct grandparents seen by the 
creator of £ at the moment when £ is about to be created; k{£) is the 
in-degree of £'s creator at the same moment; and the denominator 
is the number of potential candidates to be followed. The indicator 
function for each link £ denotes whether the link connects with a 
grandparent or not in the real data: 



1 if ^ links to a grandparent 
otherwise. 



The expected number of links to grandparents according to the null 
hypothesis can be then computed as: 

L 

and its variance is given by: 

L 

4 = Epg W(1-pg {£)) 

t=l 

while the corresponding empirical number is: 

L 

Sa = J2^G{£). 
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Figure 4: Individual preferences for following grandparents 
(red circles), origins (blue squares) and triadic nodes (green tri- 
angles) change with the in-degree of the link creator. 

According to the Lyapunov central limit theoremj^ the variable 
zg = {Sg — Eg)/o'g is distributed according to a standard nor- 
mal A/'(0, 1). For linking to origins (O) or triadic nodes (A), we 
can define zq and za similarly. In all three cases, using a z-test, we 
can reject the null hypotheses with high confidence (p < 10^^"). 
We conclude that links established by following grandparents, ori- 
gins or triadic nodes happen much more frequently than by random 
connection. These link creation mechanisms have important roles 
in the evolution of the social network. 

4.2 User Preference 

To study the dependence of the link formation tendencies on the 
different stages of an individual's lifetime, let us compute zq, zq 
and z% for links created by users with in-degree k, that is, those 
who are following k users at the time when the link is created. 
Fig. |4] shows that the principle of triadic closure dominates user 
behavior when one follows a small number of users (k < 75). In 
the early stages, one does not receive much traffic, so it is natural to 
follow people based on local social circles, consistently with triadic 
closure. However, users who have been active for a long time and 
have followed many people (k > 75) have more channels through 
which they monitor traffic. This creates an opportunity to follow 
others from whom they have seen messages in the past. 

4.3 Traffic Bias 

Further inspection of the empirical data reveals that not all short- 
cuts are equally likely; users tend to follow those who have often 
been sources of seen messages. To investigate this, consider all new 
shortcuts to grandparents or origins. For each shortcut, we rank 
all the available grandparent or origin candidates according to how 
many of their messages have been seen by the creator prior to the 
link formation. We plot the probability of a followed grandparent 
or origin having a certain rank percentile in Fig. [5] The plot clearly 
demonstrates that repeated exposure to contents posted by a user 
increases the probability of following that user. This is analogous 
to the way in which we are more likely to adopt a piece of infor- 
mation or behavior to which we are exposed multiple times (5]|12[ 



'Lyapunov's condition, ^ Y^,"^^ E[{X{£) - p(^))*] "-^T 

where X(^) is a random Bernoulli variable with success probabil- 
ity p{l) 1 8 1, is consistent with numerical tests. Details are omitted 
for brevity. 
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Figure 5: Probability density of followed grandparents (red 
circles) or origins (blue squares) having a certain rank per- 
centile. Link targets are ranked so that the link creator has 
seen more messages from a user with smaller rank percentile. 



|48||26| . This observation shows that topology alone is insufficient 
to explain the evolution of the network; activity patterns — the dy- 
namics on the network — are a necessary ingredient in describing 
the formation of new links. 

4.4 Link Efficiency 

In information diffusion networks like Twitter and Yahoo! Meme, 
social links may have a key efficiency function of shortening the 
distance between information creators and consumers. An efficient 
link should be able to convey more information to the follower than 
others. Hence we define the efficiency of link (. as the average num- 
ber of posts seen or reposted through £ during one time unit after 
its creation: 



T 



??rcpost 



t^rcpost (^) 

T - t{£) 



where 'w{£) is the number of messages seen or reposted through £; 
t{£) is the time when £ was created; and T is the time of the last 
action recorded in our dataset. Both seen and reposted messages 
are considered, as they represent different types of traffic; the for- 
mer are what is visible to a user, and the latter are what a user is 
willing to share. We compute the link efficiency of every grand- 
parent, origin, and triadic closure link. As shown in Fig. [6] both 
grandparent and origin links exhibit higher efficiency than triadic 
closure links, irrespective of the type of traffic. By shortening the 
paths of information flows, more posts from the content generators 
reach the consumers. 

5. RULES OF NETWORK EVOLUTION 

To infer the different link creation strategies from the observed 
data, we characterize users with a set of probabilities associated 
with different actions, and approximate these parameters by 
Maximum-Likelihood Estimation (MLE) |14|. For each link £, we 
know the actual creator and the target; we can thus compute the 
likelihood f(£\r, O) of the target being followed by the creator ac- 
cording to a particular strategy F, given the network configuration 
O at the time when £ is created. The likelihoods associated with 
different strategies can be mixed according to the parameters to ob- 
tain a model of link creation behavior. Finally, assuming that link 
creation events are independent, we can derive the likelihood of 
obtaining the empirical network from the model by the product of 
likelihoods associated with every link. The higher the value of the 
likelihood function, the more accurate the model. 



5.1 Single Strategies 

Let us consider five link creation mechanisms and their combi- 
nations: 

Random (Rand): follow a randomly selected user who is not yet 
followed. 

Triadic closure (A): follow a randomly selected triadic node. 

Grandparent (G): follow a randomly selected grandparent. 

Origin (O): follow a randomly selected origin. 

Traffic shortcut (G U O): follow a randomly selected grandpar- 
ent or origin. 

To model link creation with a single strategy, we can use a parame- 
ter p for the probability of using that strategy, while a random user 
is followed with probability 1 — p. 

The calculation of maximum likelihood, taking the single strat- 
egy of grandparents as an example, is as follows: 

L 

^g(p) = n e) + (1 - p)/(^jRand, 6)) 

IgW , . 1 



n 



-Ph 



= n 
n 

1gW=o 



P , 1 -P 
Naie) l-k{£) 

1-p 



Note that since a follow action can be ascribed to multiple strate- 
gies, it can contribute to multiple terms in the log-likelihood ex- 
pression. For instance, a link could be counted in both f{£\G, Q) 
and /(fjRand, O). For numerically stable computation, we maxi- 
mize the log-likelihood: 



log Co {p)= In 

lG(f) = l 



i-P 

^ k{£) - 1 



+ 



E 



In 



£ - k{e) - 1 ■ 



Similar expressions of log-likelihood can be obtained for other strate- 
gies (A, O, and G U O). 

It is not trivial to obtain the best p analytically, so we explore the 
values of p £ (0, 1) numerically (Fig.|7j. Triadic closure domi- 
nates as a single strategy, with pA ~ 0.82, consistently with the 
large number of triadic closure links observed in the data. Traffic- 
based strategies alone account for about 20% of the links. 

5.2 Combined Strategies 

For a more realistic model of the empirical data, let us con- 
sider combined strategies with both triadic closure and traffic-based 
shortcuts. For each link £, the follower with probability pi creates 
a shortcut by linking to a grandparent (G), an origin (O), or either 
of them (G U O); with probability p2 follows a triadic node (A); 
and with probability 1 — pi — P2 connects to a random node. 
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Figure 6: Efficiency of Unlis created according to different 
mechanisms, or average number of messages (A) seen or (B) re- 
posted per time unit. Each box shows data within lower and 
upper quartile. Whislters represent the 99th percentile. The 
triangle and line in a box represent the median and mean, re- 
spectively. Note that the mean can fall outside the shown quan- 
tiles for skewed distributions. The grey area and the black line 
across the entire figure mark the interquartile range and the 
median of the measure across all links, respectively. 



Taking the combined strategy with grandparent as an example, 
we compute the log-likelihood as: 

log -Cg+ A (Pl,P2) 
L 

= log niPi/WG, e) + P2/(£| A, e) 

+ (l-pi-p2)/(£|Rand,e)] 



E 



loe 



1g(<)=i 



Pi 



+ 



P2 
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1-pi 
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+ 
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Pi 



Na{£) l-k{t} - 1 

1 - Pl - P2 



+ 



Nait) l-kil) - 1 



E 



P2 



+ 



1 — Pl — P2 



N^{1) £-k{l) - 1 

1 - Pl - P2 



+ E ^°&I^k{£)-l- 
1g(«)=o ^ ' 

lA(f) = 

Once again, many follow actions can create both triadic closure 
links and traffic shortcuts, so they can contribute to multiple terms 
in the log-likelihood expression. 

It is hard to obtain the optimal solution analytically. We numeri- 
cally explore the values of pi and p2 in the unit square to maximize 
the log-likelihood. The best combined strategy is the one consid- 
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Figure 7: Plot of the log-likelihood log C(p) as a function of link creation strategy probabilities for models with a single strategy. 
The red circles mark the maximized log C{p). 



ering both grandparents and origins as well as triadic closure (see 
Fig. [Sf. The parameter settings and the maximum likelihood val- 
ues for all tested models are listed in Table [T] We can compare 
the quality of these models by comparing their maximized log/I's. 
The combined models with both traffic shortcuts and triadic closure 
yield the best accuracy. In these models, triadic closure accounts 
for 71% of the links, grandparents and origins for 12%, and the rest 
are created at random. 

Thus far we have assumed that each user has the same behavior; 
in the next section we model each user separately. 

6. USER BEHAVIOR 

The MLE models for describing the system behavior can be sim- 
ilarly employed to characterize the strategy of an individual user. 
Let us focus on the model G U O + A that best reproduces the 
empirical data at the global level. We run MLE to explain the links 
created by each user independently. We consider users with at least 
20 in-links, such that MLE is meaningful. To facilitate the interpre- 
tation of the parameters, let us call ptraffic = Pi, Pstructure = P2 
and Prandom = 1 " Pi ^ P2 • Each user has her own set of parame- 
ters. 



1.0 




Figure 8: The contour plot of log-likelihood \ogC{pi,p2) for 
the combined strategy of creating traffic shortcuts (G U O) with 
probability pi and triadic closure links (A) with probability p2. 
The black triangle marks the optimum. 



Table 1: The best parameters in different models and corre- 
sponding values of maximized log-likelihood function. 



Strategy 


Model 


Parameters 


max log C 


Baseline 


Rand 




-3.75 X 10' 


Single 


A 

G 

GUO 


p = 0.82 
p = 0.19 
p = 0.17 
p = 0.21 


-3.15 X 10' 
-3.64 X lO'' 
-3.65 X 10'^ 
-3.63 X lO'^ 


Combined 


G + A 
+ A 
GUO + A 


pi = 0.12 
P2 = 0.71 
pi = 0.10 
P2 = 0.73 
pi = 0.12 
P2 = 0.71 


-3.12 X lO'^ 
-3.13 X lO'^ 
-3.12 X lO'^ 



6.1 User Strategy Classification 

Using the Expectation-Maximization (EM) algorithm (1 1[ |16| , 
users are clustered into several classes based on ptraffic, Pstructure 
and Prandom- EM iterativcly preforms an expectation step to com- 
pute the probability that each instance belongs to each class, and a 
maximization step in which latent variables of classes are altered to 
maximize the expected likelihood of the observed data. EM decides 
how many clusters to create by cross validation. This procedure 
yields five classes: 

Information-Oriented (Info): People prefer to follow someone 
from whom or through whom they have received messages. 

Friend of a Friend (Friend): People follow users two steps away 
to form triadic closure, almost exclusively. 

Casual Friendship (CFrd): People tend to follow a set of users 
their friends are following; they also link to random users 
occasionally. 

Mixture (Mix): Miscellaneous behavior of creating traffic short- 
cuts, connecting others by triadic closure, and following ran- 
dom people. 

Random Browsing (Rand): People have a much higher prefer- 
ence for following a random user who is not close in either 
the follower network or the message flow network. "Ran- 
dom" does not necessarily imply the absence of any rule; 
there can be other strategies not explored in our model, i.e., 
following a celebrity on purpose (similar to preferential at- 
tachment). 



Table 2: Classes of user link creation strategy 



Class 


#Users 


(Ptrafflc) 


(pstructurc) 


(prandom) 


All Users 


45,708 


0.07 


0.77 


0.17 


Info 


4,750 


0.52 


0.36 


0.13 


Friend 


12,797 


0.00 


0.96 


0.04 


CFrd 


23,469 


0.01 


0.80 


0.19 


Mix 


2,524 


0.07 


0.63 


0.30 


Rand 


2,168 


0.09 


0.32 


0.59 



Table |2] displays the parameter averages for users in each class, 
representing the overall behavior pattern in that class. Users in the 
mixture category behave similarly to the average across all users. 
Fig.[9]illustrates how users in different classes are mapped into the 
parameter space with the probability of each link creation strategy 
as one dimension. 

6.2 Characterization of User Classes 

To further differentiate users with different link creation strate- 
gies, let us look at several structural and behavioral characteristics 
of each class. Figs. |10[A-C) show how users in different classes 
create social links by comparing ptrafflc, Pstructure and Prandom. 

As shown in Fig. |10[D), information-oriented users have been 
active longer than users in friendship classes. Similarly, information- 
oriented users tend to follow more people (Fig.|10[E)). Information- 
oriented users have even more followers compared to friendship- 
driven users (Fig. 1 10(F)). This suggests that they tend to be more 
influential, as confirmed by considering the number of times that 
their messages are reposted (Fig. |10[G)). Friendship-driven users 
follow a few people while essentially nobody is following them. 
Such a passive role can be explained by their short lifetime. All of 
these results are consistent with Fig. [4] 

Finally, Figs. |10(H-I) suggest that, while information-oriented 
users tend to produce more messages, their role is more that of 
spreaders than producers of information compared to other classes. 

7. CONCLUSION 

The study of the feedback loop between the dynamics of and on 
the network — how the network grows and how the information 
spreads — offers a promising framework for understanding social 
influence, user behavior, and network efficiency in the context of 
micro-blogging systems. 

The results presented in this paper show that while triadic clo- 
sure is the dominant mechanism for social network evolution, it is 
mainly relevant in the early stages of a user's lifetime. As time pro- 
gresses, the traffic generated by the dynamics of information flow 
on the network becomes an indispensable component for user link- 
ing behavior. As users become more active and influential, their 
links create shortcuts that make the spread of information more ef- 
ficient in the network. Users whose following behavior is driven by 
the information they see are a minority of the population, but play 
a key role in the information diffusion process. They produce more 
information, but, even more importantly, they act as spreaders of 
the information they collect widely across the network. 

We believe our findings apply generally to techno-social net- 
works, and in particular information diffusion networks and (mi- 
cro)blogs. Analyses of other micro-blogging systems, such as Twit- 
ter, would be needed to confirm this, but will be challenging due to 
the difficulty of obtaining full longitudinal data about user actions 
on the social network. 

We are looking at the possibility to share with the community 




Pstructure=1.0 ( ^traffic = 0.0 ) Prandom = 1 . 

Figure 9: Ternary plot of users according to ptrafflc, Pstructure 

and Ptrafflc 

an anonymized dataset along with source code that would allow to 
reproduce and extend our analyses. A URL for download will be 
included in the camera-ready version of the paper. 
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Figure 10: Various features of users in different classes. Tlie lifetime of a user is measured by how many others join the system after 
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